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Abstract 

In many human brain network studies, we do not have sufficient 
number (n) of images relative to the number (p) of voxels due to the 
prohibitively expensive cost of scanning enough subjects. Thus, brain 
network models usually suffer the small-n large-p problem. Such a 
problem is often remedied by sparse network models, which are usually 
solved numerically by optimizing Ll-penalties. Unfortunately, due to 
the computational bottleneck associated with optimizing Ll-penalties, 
it is not practical to apply such methods to construct large-scale brain 
networks at the voxel-level. In this paper, we propose a new scalable 
sparse network model using cross-correlations that bypass the compu¬ 
tational bottleneck. Our model can build sparse brain networks at the 
voxel level with p > 25000. Instead of using a single sparse parameter 
that may not be optimal in other studies and datasets, the computa¬ 
tional speed gain enables us to analyze the collection of networks at 
every possible sparse parameter in a coherent mathematical framework 
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via persistent homology. The method is subsequently applied in deter¬ 
mining the extent of heritability on a functional brain network at the 
voxel-level for the first time using twin fMRI. 


1 Introduction 

Large-scale brain networks. In brain imaging, there have been many at¬ 
tempts to identify high-dimensional imaging features via multivariate ap¬ 
proaches including network features (Chung et ah, 2013; Lerch et ah, 2006; 
He et ah, 2007; Worsley et ah, 2005a; Rao et ah, 2008; Cao and Worsley, 
1999; He et ah, 2008). Specifically, when the number of voxels (often denoted 
as p) are substantially larger than the number of images (often denoted as 
n), it produces an under-determined model with infinitely many possible 
solutions. The small-n large-p problem is often remedied by regularizing the 
under-determined system with additional sparse penalties. Popular sparse 
models include sparse correlations (Lee et ah, 2011; Chung et ah, 2013, 
2015), LASSO (Bickel and Levina, 2008; Peng et ah, 2009; Huang et ah, 
2009; Chung et ah, 2013), sparse canonical correlations (Avants et ah, 2010) 
and graphical-LASSO (Banerjee et ah, 2006, 2008; Friedman et ah, 2008; 
Huang et ah, 2009, 2010; Mazumder and Hastie, 2012; Witten et ah, 2011). 
Most of these sparse models require optimizing L 1-norm penalties, which 
has been the major computational bottleneck for solving large-scale prob¬ 
lems in brain imaging. Thus, almost all sparse brain network models in 
brain imaging have been restricted to a few hundreds nodes or less. As far 
as we are aware, 2527 MRI features used in a LASSO model for Alzheimer’s 
disease (Xin et ah, 2015) is probably the largest number of features used in 
any sparse model in the brain imaging literature. 

In this paper, we propose a new scalable large-scale sparse network model 
{p > 25000) that yields greater computational speed and efficiency by by¬ 
passing the computational bottleneck of optimizing Ll-penalties. There are 
few previous studies at speeding up the computation for sparse models. By 
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identifying block diagonal structures in the estimated (inverse) covariance 
matrix, it is possible to reduce the computational burden in the penalized 
log-likelihood method (Mazumder and Hastie, 2012; Witten et ah, 2011). 
However, the proposed method substantially differs from Mazumder and 
Hastie (2012) and Witten et ah (2011) in that we do not need to assume 
that the data to follow Gaussianness. Subsequently, there is no need to spec¬ 
ify the likelihood function. Further, the cost functions we are optimizing are 
different. Specihcally, we propose a novel sparse network model based on 
cross-correlations. Although cross-correlations are often used in sciences in 
connection to times series and stochastic processes (Worsley et ah, 2005a,b), 
the sparse version of cross-correlation has been somewhat neglected. 

Persistent network homology. Any sparse model ^(A) is usually param¬ 
eterized by a tuning parameter A that controls the sparsity of the repre¬ 
sentation. Increasing the sparse parameter makes the solution more sparse. 
Sparse models are inherently multiscale, where the scale of the models is 
determined by the sparse parameter. However, many existing sparse net¬ 
work approaches use a hxed parameter A that may not be optimal in other 
datasets or studies (Chung et ah, 2013; Lee et ah, 2012). Depending on the 
choice of the sparse parameter, the final classihcation and statistical results 
will be totally different (Lee et ah, 2012; Chung et ah, 2013, 2015). Thus, 
there is a need to develop a multiscale network analysis framework that 
provides a consistent statistical results and interpretation regardless of the 
choice of parameter. Persistent homology, a branch of algebraic topology, 
offers one possible solution to the multiscale problem (Carlsson and Mem- 
oli, 2008; Edelsbrunner and Harer, 2008). Instead of looking at images and 
networks at a fixed scale, as usually done in traditional analysis approaches, 
persistent homology observes the changes of topological features over differ¬ 
ent scales and finds the most persistent topological features that are robust 
under noise perturbations. This robust performance under different scales 
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is needed for most network models that are parameter and scale dependent. 

In this paper, instead of building networks at one fixed sparse parameter 
that may not be optimal, we propose to build the collection of sparse mod¬ 
els over every possible sparse parameter using a graph filtration, a persistent 
homological construct (Chung et ah, 2013; Lee et ah, 2012). The graph 
filtration is a threshold-free framework for analyzing a family of graphs but 
requires hierarchically building specific nested subgraph structures. The pro¬ 
posed method share some similarities to the existing multi-thresholding or 
multi-resolution network models that use many different arbitrary thresh¬ 
olds or scales (Achard et ah, 2006; He et ah, 2008; Kim et ah, 2015; Lee 
et ah, 2012; Supekar et ah, 2008). However, such approaches have not often 
applied to sparse models even in an ad-hoc fashion before. Additionally, 
building sparse models for multiple sparse parameters can causes an addi¬ 
tional computational bottleneck to the already computationally demanding 
problem as well as introducing an additional problem of choosing optimal 
parameters. The proposed persistent homological approach can address all 
these shortcomings within a unified mathematical framework. 

Heritability of brain networks. Many brain imaging studies have shown 
the widespread heritability of neuroanatomical structures in magnetic res¬ 
onance imaging (MRI) (McKay et ah, 2014) and diffusion tensor imaging 
(DTI) (Chiang et ah, 2011). However, these structural imaging studies 
use univariate imaging phenotypes such as cortical thickness and fractional 
anisotropy (FA) in determining heritability at each voxel or regions of in¬ 
terest. There are also few fMRI and EEG studies that use functional acti¬ 
vations as possible imaging phenotypes at each voxel (Blokland et ah, 2011; 
Glahn et ah, 2010; Smit et ah, 2008). Compared to many existing studies 
on univariate phenotypes, there are not many studies on the heritability 
of brain networks (Blokland et ah, 2011). Measures of network topology 
and features may be worth investigating as intermediate phenotypes or en- 
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dophenotypes, that indicate the genetic risk for a neuropsychiatric disorder 
(Bullmore and Sporns, 2009). However, the brain network analysis has not 
yet been adapted for this purpose. Determining the extent of heritability of 
brain networks with large number of nodes is the first necessary prerequisite 
for identifying network-based endophenotypes. This requires constructing 
the large-scale brain networks by taking every voxel in the brain as network 
nodes with at least a billion connections, which is a serious computational 
burden. 

Motivated by these neuroscientific and methodological needs, we pro¬ 
pose to develop algorithms for building large-scale brain networks based on 
sparse cross-correlations. The proposed sparse network model is used in 
constructing large-scale brain networks at the voxel level in fMRI consisting 
of monozygotic (MZ) and dizygotic (DZ) twin pairs. The twin study design 
in brain imaging offers a very effective way of determining the heritability 
of the multiple features of the human brain. Heritability (broad sense) can 
be measured by comparing the resemblance among monozygotic (MZ) twins 
and dizygotic (DZ) twins using Falconer’s method (Falconer and Mackay, 
1995; Tenesa and Haley, 2013) at the voxel-level. As a way of quantify¬ 
ing the genetic contribution of phenotypes, the heritability index (HI) has 
been extensively used in twin studies. Heritability index (HI) is the the 
amount of genetic variations in a phenotype. For instance, 60% HI implies 
the phenotype is 60% heritable and 40% environmental. Except for few 
well known neural circuits (Glahn et ah, 2010; van den Heuvel et ah, 2013), 
the extent to which heritability influences functional brain networks at the 
voxel-level is not well established. Although HI is a well formulated concept, 
it is unclear how to best apply HI to networks. In this paper, we generalize 
the voxel-wise univariate heritability index (HI) into a network-level multi¬ 
variate feature called heritability graph index (HGI), which is subsequently 
used in determining the genetic effects at the network-level. The proposed 
framework is used to determine the first-ever voxel-level heritability map of 
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functional brain network. 


2 Methods 


2.1 Sparse Cross-Correlations 

Let V = {ui,-- - ,Vp} be a node set where data is observed. We expect 
the number of nodes p to be significantly larger than the number of images 
n, i.e., p ^ n. In this study, we have p = 25972 voxels in the brain that 
serves as the node set. Let Xk{vi) and yk{vi) be the fc-th paired scalar 
measurements at node Vi. Denote x(uj) = {xi{vi), ■ ■ ■ ,Xn{vi)y and y{vi) = 
(yi(ui),--- ,ynivi)y be the paired data vectors over n different images at 
voxel Vi- Center and scale x and y such that 

n n 

"^Xhivi) = '^yk{vi) = 0 , 

k=l k=l 

||x(ui)f = x'{vi)x{vi) = ||y(ui)f = y'(ui)y(ui) = 1 

for all Vi- The reasons for centering and scaling will soon be obvious. We 
set up a linear model between x(uj) and y{vj): 

y(vj) = bij x{vi) + e, (1) 


where e is the zero-mean error vector whose components are independent 
and identically distributed. Since the data are all centered, we do not have 
the intercept in linear regression (1). The least squares estimation (LSE) of 
bij that minimizes the L2-norm 

p 

II y{xj) - bij x{vi) f (2) 


is given by 


bij = x'{vi)y{vj), 


(3) 
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which are the (sample) cross-correlations (Worsley et ah, 2005a,b). The 
cross-correlation is invariant under the centering and scaling operations. 
The sparse version of L2-norm (2) is given by 


p p 

F(/3;x,y,A) = ^'^ \\ y{vj) - Pij y^{vi) f +A ^ \/3ij\. (4) 

i,j=l *J=1 

The sparse cross-correlation is then obtained by minimizing over every pos¬ 
sible I3ij G M: 

^(A) = argmmF(/3;x,y, A). (5) 

The estimated sparse cross-correlations /3(A) = (/3ij(A)) shrink toward zero 
as sparse parameter A > 0 increases. The direct optimization of (4) for large 
p is computationally demanding. However, there is no need to optimize (4) 
numerically using the coordinate descent learning or the active-set algorithm 
as often done in sparse optimization (Peng et ah, 2009; Friedman et ah, 
2008). We can show that the minimization of (4) is simply done algebraically. 


Proposition 1. For A > 0, the minimizer of F{/3;x,y, X) is given by 


A.'(A) 


/ 

x'(ui)y(uj) - A 
< 0 

,x'(ui)y(uj) -h A 


if x'(uj)y(uj) > A 
*/ |x'(ui)y(uj)| < A ■ 
if x'(uj)y(uj) < -A 


( 6 ) 


See Appendix for proof. Although it is not obvious. Proposition 1 is 
related to the orthogonal design in LASSO (Tibshirani, 1996) and the soft- 
shrinkage in wavelets (Donoho et ah, 1995). To see this, let us transform 
linear equations (1) into a index-free matrix equation: 


y(^^i) • 

• y(^^i) 


6iix(ui) 

621 x(u 2 ) • 

• bpix{vp) 


e • 

e 

y(^^2) • 

• y('i'2) 

= 

6i2x(ui) 

522x(u2) • 

■ bp2x{Vp) 

-h 

e • 

e 

_ y(^p) • 

• y(^p) . 


6ipx(ui) 

^2pX(u2) • 

6ppx('Up) 


e • 

e 
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Equation (7) can be vectorized as follows. 


y(^^i) 


yivp) 



= 

y{vi) 


_ yivp) _ 



x(ui) 


0 


x(ui) 


0 


x(Up) 


0 


(8) can be written in a more compact form. Let 

X„xp = [x(i;i) x(u 2 ) • • • x(up)] 
Ynxp = [y(^^i) y(^^2) • • • y('i'p)] 

1 1 ••• 1 


^axb — 


1 1 ••• 1 


-* axb 


x(Up 


Then (8) can be written as 

Ipxi 0 uec(Y) = X„p 2 xp 2 vec{b) + l„p2xi ® e, 



' bn ■ 


e 


bpl 


e 



+ 



^Ip 


e 


r ■ 

1_ 


e 


• ( 8 ) 


(9) 


where vec is the vectorization operation. The block diagonal design matrix 
X consists of p diagonal blocks Ip ® x(ui), • • • , Ip (8) x(up), where Ip is p x p 
identity matrix. Subseqeuntly, X'X is again a block diagonal matrix, where 
the i-th block is 


[Ip ® x{Vi)]'[Ip (g) x(Ui)] = Ip (g) [x(Ui)'x(Ui)] = Ip. 

Thus, X is an orthogonal design. However, our formulation is not exactly 
the orthogonal design of LASSO as specified in Tibshirani (1996) since the 
noise components in (9) are not independent. Further in standard LASSO, 
there are more columns than rows in X. In our case, there are n times more 
rows. 
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Figure 1: Top: The sparse cross-correlations are estimated by minimizing 
the LI cost function (4) for 4 different sparse parameters A (Proposition 1). 
The edge weights shrinked to zero are removed. Bottom: the equivalent bi¬ 
nary graph can be obtained by the simple soft-thresholding rule (Proposition 
2), i.e., simply thresholding the sample cross-correlations at A. The number 
of clusters (denoted as jf) and the size of the largest cluster (denoted as k) 
are displayed over different A values. 

Proposition 1 generalizes the sparse correlation case given in (Chung 
et ah, 2013). Figure 1-top displays an example of obtaining sparse cross¬ 
correlations from the initial sample cross-correlation matrix 
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using Proposition 1. For simplicity, only the upper triangle part of the 
sample cross-correlation is demonstrated. 
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2.2 Graph Filtrations 

From estimated sparse cross-correlations /3(A), we will first build the graph 
representation for subsequent brain network quantification. Instead of try¬ 
ing to determine the optimal parameter A and fix A as often done in many 
sparse network models (Peng et ah, 2009; Huang et ah, 2009; Xin et ah, 
2015), we will analyze the sparse networks over every possible A. A single 
optimal parameter may not be universally optimal across different datasets 
or even sufficient. This new approach enables us to build a multiscale net¬ 
work framework over sparse parameter A. 


Definition 1. Suppose weighted graph G = iV, W) is formed by the pair of 
node set V = {ui, • • • , Vp} and edge weights Wpxp = {wij)- L^t ^ and he 
binary operations on weighted graphs such that = {V,W^),W^ = {w^^) 
and = (F, hF+^), IF+^ = {wf/') with 


w°- = 
V 


1 if Wij 0, 
0 otherwise 


and 




0 


if Wij > A; 
otherwise 


The operations ^ and threshold edge weights and make the weighted 
graph binary. 


Definition 2. For graphs Gi = {V,W),W = {wij) and G 2 = {V,Z),Z = 
{zij), G 2 is a subset of Gi, i.e., Gi D G 2 , if Wij > Zij for all i,j. The 
eolleetion of nested subgraphs 


Gi D G2 D • • • D Gfc 


is called a graph filtration, k is the level of the filtration. 

Definition 2 extends the concept of the graph filtration heuristically given 
in Lee et ah (2012) and Chung et ah (2013), where only undirected graphs 
are considered, to more general directed graphs. Graph filtrations is a special 
case of Rips filtrations often studied in persistent homology (Carlsson and 
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Memoli, 2008; Edelsbrunner and Rarer, 2008; Singh et ah, 2008; Christ, 
2008). From Definitions 1 and 2, for arbitrary weighted graph G = (V,W), 
we have 

G+^l 3 ( 7+^2 3 Q+X3 ^ ( 10 ) 

for any Ai < A 2 < A 3 • • •. Figure 1-bottom illustrates a graph filtration 
obtained with "*■ operation. 

Using these concepts, we will build persistent homology on sparse net¬ 
work ^(A) = (U,/3(A)) with sparse cross-correlations /3(A) obtained from 
(5). For this, we will first construct the collection of infinitely many binary 
graphs {^°(A) : A G M+}. 

Proposition 2. Let ^^(A) be the binary graph obtained from sparse network 
G{X) = (U, /3(A)) where /3(A) are sparse correlations. Consider another graph 
LL = {V,p), where the edge weights pij = |x(uj)'y(^j)| the sample eross- 
correlations. Then, we have the following. 

(1) So ft-thresholding rule: ^*^(A) = for all A > 0. 

(2) The collection of binary graphs {^°(A) : A G M+} forms a graph filtration. 

(3) Order edge weights ptj sueh that 

P( 0 ) = 0 < p(i) = min Pij < ■ ■ ■ < p(^q) = maxp^. 

Then, for any A > 0, ^°(A) = for some i. 

See Appendix for proof. Proposition 2-(l) enables us to construct large- 
scale sparse networks by the simple soft thresholding rule. This completely 
bypasses numerical optimizations that have been the main computational 
bottleneck in the field. Figure 1 illustrates the soft-thresholding rule in 
obtaining the equivalent sparse binary graph without optimization. 

Proposition 2-(2) and 2-(3) further show that we can monotonically map 
the collection of constructed graphs {0*^(A) ; A G M+} into finite graph 
filtration 

■^+P(o) 3 ■^+P{i) 3 ... 3 ( 11 ) 
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Figure 2: The result of graph filtrations on twin fMRI data. The number 
of disjoint clusters (left) and the size of largest cluster (right) are plotted 
over the filtration values. At a given filtration value, MZ-twins have smaller 
number of clusters but larger cluster size. 

For a graph with p nodes, the maximum possible number of edges is (p^ — 
p)/2, which is obtained in a complete graph, (p^ — p)/2 + 1 is the upper 
limit for the level of filtration in (11). 

2.3 Statistical Inference on Graph Filtrations 

We propose a new statistical inference procedure for persistent homology. 
The method is applicable to many filtrations in persistent homology includ¬ 
ing Rips, Morse as well as graph filtrations. We apply monotonic graph 
functions as multiscale features for statistical inference. 

Definition 3. Let be the number of connected components (or disjoint 
clusters) of graph G and hG be the size of the largest component (or cluster) 
ofG. 

The graph functions ff and & are monotonic over graph filtrations. For 
filtration Gi D G 2 L) •••, Note #(Gi) < #(^ 2 ) < ••• and &:(Gi) > 
^(^ 2 ) > • • •. Figure 1 illustrates this monotonicity on the 4-nodes sparse 
network. 
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Proposition 3. Let M be the minimum spanning tree (MST) of weighted 
graph G = {V, p) with edge wights p = (pij). Suppose the edge weights 
of M. are sorted as /9(o) = —oo < p(i) < ••• < P{p-i)- Then for all X, 

for some i. 

See Appendix for proof. We do not show it here but an analogue state¬ 
ment can be similarly obtained for &:(G). Proposition 3 can be used to 
monotonically map infinitely many G^^ to only p number of sorted features 

^M+{-oo) < <■■■< (12) 

Similarly, we can monotonically map as 

^M+i-oo) > fcM+AD > ... > (13) 

Figure 2 displays the plot of the number of clusters and the size of the largest 
cluster over filtration values for our twin fMRI data with p = 25972 nodes. 
There are clear group discriminations. The statistical significance for of the 
discrimination can be computed as follows. 

Let Bi{X) be monotonic graph functions, such as ff and over graph 
filtrations Gf^. Then we are interested in testing the null hypothesis Hq of 
the equivalence of the two set of monotonic functions: 

Hq : Ri(A) = B 2 {X) for all A. 

As a test statistic, we propose to use 

Dp = sup |.Bi(A) - B 2 {X)\, 

X 

which can be viewed as the distance between two graph filtrations. The 
test statistic Dp is related to the two-sample Kolmogorove-Smirnov (KS) 
test statistic (Chung et ah, 2013; Gibbons and Chakraborti, 2011; Bohm 
and Hornik, 2010). The test statistic takes care of the multiple comparisons 
over sparse parameters so there is no need to apply additional multiple 
comparisons correction procedure. The asymptotic probability distribution 
of Dp can be determined. 
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Proposition 4. limp_j,oo P{Dp/yj2{p - 1) < d) = 1-2 

See Appendix for proof. Note Proposition 4 is nonparametric test and 
does not assume any statistical distribution other than that Bi and B 2 are 
monotonic. From Proposition 4, p-value under Hq is computed as 

p-value = 2e"'^° - • • • , 

where do is the observed value of Dp/ y^2(p — 1) in the data. For any large 
observed value do > 2, the second term is in the order of 10“^^ and insignif¬ 
icant. Even for small observed do, the expansion converges quickly. 

2.4 Validation Study 

The proposed method was validated in two simulations with the known 
ground truths. The simulations were performed 1000 times and the average 
results were reported. There are three groups and the sample size is re = 20 
in each group and the number of nodes are p = 100. In Group 1 and 2, data 
Xk{vi) at node Vi was simulated as standard normal, i.e., Xk{vi) ~ V(0,1). 
The paired data was— simulated as jjkivi) = Xk{vi) + V(0,0.02^) for all 
the nodes. The simulated networks can be found in Figure 3. Following the 
proposed method, we obtained the p-values of 0.712±0.331 and 0.462±0.413 
for the number of clusters and the size of largest cluster. We could not detect 
any group differences as expected. 

Group 3 was generated identically but independently like Group 1 and 
2 but additional dependency was added. For ten nodes indexed by j = 
1,2,-- - ,10, we simulated yk{vj) = Xk{vi) -|-iV(0, 0.02^). This dependency 
gives high connection differences between Groups 1 and 3. The simulated 
network can be found in Figure 3. We obtained the p-values of 0.025 ±0.020 
and 0.004 ± 0.010 for the number of clusters and the size of largest cluster 
demonstrating that we can detect the network differences as expected. 
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Figure 3: Simulation studies. Three groups are simulated. Group 1 and 
Group 2 are generated independently but identically. The resulting B{\) 
plots are similar. The dotted red line is the number of clusters and the 
solid black line is the size of the largest cluster. No statistically significant 
differences can be found between Groups 1 and 2. Group 3 is generated 
independently but identically as Group 1 but additional dependency is added 
for the first 10 nodes. The resulting B{X) plots show statistically significant 
differences. This simulation is repeatedly done 1000 times. 

3 Application 

3.1 Twin fMRI Dataset 

The study consists of 11 monozygotic (MZ) and 9 same-sex dizygotic (DZ) 
twin pairs of 3T functional magnetic resonance images (fMRI) acquired 
in Intera-Achiava Phillips MRI scanners with a state-of-the-art 32 channel 
SENSE head coil. Research was approved through the Vanderbilt Univer¬ 
sity Institutional Review Board (IRB). Blood oxygenation-level dependent 
(BOLD) functional images were acquired with a gradient echoplanar se- 
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quence, with 38 axial-oblique slices (3 mm thick, 0.3 mm gap), 80 x 80 in 
plane resolution, TR (repetition time) 2000 ms, TE (echo time) 25 ms, flip 
angle = 90. Subjects completed monetary incentive delay task (Knutson 
et ah, 2001) of 3 runs of 40 trials. A total 120 trials consisting of 40 $0, 40 
$1 and 40 $5 rewards were pseudo randomly split into 3 runs. The aim of 
the task is to rapidly respond to a target in order to earn rewards. Trials 
begin with a cue indicating the amount of money that’s at stake for that 
specific trial, then there is a delay period between the cue and the target 
in which participants prepare for the target to appear. Then a white star 
(target) flashes rapidly on the screen. If the participants hit the response 
button while the star is on the screen win the money. If they hit too late, 
they do not win the money. After the response, and a brief delay a feedback 
slide is shown indicating to the participants whether they hit the target on 
time or not and the amount of money they made for that specific trial. 

The fMRI dimensions after preprocessing are 53 x 63 x 46 and the voxel 
size is 3mm cube. There are a total of p = 25972 voxels in the brain tem¬ 
plate. Figure 4 shows the outline of the template. fMRI data went through 
spatial normalization to a template following the standard SPM pipeline 
(Penny et ah, 2011). Images were processed with Gaussian kernel smoothing 
with 8 mm full-width-at-half-maximum (FWHM) spatially. Neuroscientifi- 
cally, we are interested in knowing the extent of the genetic influence on the 
functional brain networks of these participants while anticipating the high 
reward as measured by activity during the delay that occurs between the 
reward cue and the target. After fitting a general linear model at each voxel 
(Friston et al., 1995; Penny et ah, 2011), we obtained contrast maps testing 
the significance of activity in the delay period for $5 trials relative to the no 
reward control condition (Figure 4). The contrast maps were then used as 
the initial data vectors x and y in our starting model (1). 

/Users/moochung/Google Drive/twinstudy/papers/arxiv.v2-2016/chung.2016.06.29.pdf 
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MZ-twins 


DZ-twins 



Figure 4: Representative MZ- and DZ-twin pairs of the contrast maps ob¬ 
tained from the first level analysis testing the significance of the delay in 
hitting the response button in $5 reward in contrast to $0 reward. Middle: 
The correlation of the contrast maps within each twin types. Bottom: The 
heritability index measures twice the difference between the correlations. 

3.2 Interchangeablility of Data 

The sparse cross-correlations /3 = {Pij) are asymmetric and induce di¬ 
rected binary graphs. Suppose there is no preference in data vectors x 
and y and they are interchangeable. Our twin data is one of those inter¬ 
changeable cases since there is no preference of one twin over the other. 
Then we have another linear model, where x and y are interchanged in 
(1): x(uj) = Cij y{vi) + e. The LSE of is % = y'(ui)x(uj). The 
symmetric version of (sample) cross-correlation C, = {Cij) is then given by 
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2Cij = bij + Cij = x'{vi)y{vj) + y'{vi)x{vj). Similarly, the symmetric version 
of sparse cross-correlation r] = (rjij) is obtained as 2r] = argmin,g F(I3] x, y)-|- 
argmin^ F( 7 ; y, x). For our study, each symmetrized cross-correlation ma¬ 
trix requires computing 1.3 billion entries and 5.2GB of computer memory 
(Figure 5). Since the cross-correlation matrices are very dense, it is diffi¬ 
cult to provide biological interpretation and interpretable data visualization. 
That is the main biological reason we developed the proposed sparse model 
to reduce the number of connections and come up with meaningful inter¬ 
pretation. All the statements and methods presented here are applicable 
to symmeterized cross-correlations as well. To see this, define the sum of 
graphs with identical node set V as follows: 

Definition 4. For graphs Gi = {V, W) and G 2 = (G, Z), 

Gi + G 2 = {V,W + Z). 

With Definition 4, we can show that the sum of any two graph filtrations 
will be again a graph filtration. Given two graph filtrations 

Gi c G2 c • • • c Gfc 


and 

we also have Gi + Hi G G 2 + H 2 C ■ ■ ■ G Gk + Hk- 

Once we obtain the two sparse cross-correlations /3 and 7 , construct 
weighted graphs ^i(A) = (G,/3(A)) and ^ 2 (A) = (G, 7 (A)). Then construct 
graph filtrations ^i(A)^ and ^ 2 (A)^. Using sample cross-correlations X'Y 
and Y'X, we can also define weighted graphs Hi = (U, X'Y) and H 2 = 
(U,Y'X). Then construct binary graphs H^^ and H^^. We have already 
shown that ^j’(A) = Hi^ and G^i^) = . Thus, we have 

ai°(A) + gO(A) = Ht^ + n+^ (14) 

and (14) forms a graph filtration over A. Thus, Proposition 2 and other 
methods can be extended to the sum of filtrations ^i(A) -|- ^2 (A) as well. 
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Figure 5: Cross-correlations for MZ- and DZ-twins for all 25972 voxels in 
the brain template. There are more than 0.67 billion cross-correlations be¬ 
tween voxels. Since the correlation matrices are too dense to visualize and 
interpret, it is necessary to reduce the number of connections. The diagonal 
entries are Pearson correlations. 

3.3 Heritability Graph Index 

The proposed methods are applied in extending the voxel-level univariate 
genetic feature called heritability index (HI) into a network-level multivariate 
feature called heritability graph index (HGI). HGI is then used in determin¬ 
ing the genetic effects by constructing large-scale HGI by taking every voxel 
as network nodes. HI determines the amount of variation (in terms of per¬ 
centage) due to genetic influence in a population. HI is often estimated using 
Falconer’s formula (Falconer and Mackay, 1995). MZ-twins share 100% of 
genes while DZ-twins share 50% genes. Thus, the additive genetic factor A, 
the common environmental factor C for each twin type are related as 


Puzi^i) 

= A + C, 

(15) 

PoziVi) 

= AI2 + C., 

(16) 


where p^z and Pdz are the pairwise correlation between MZ- and and same- 
sex DZ-twins. Solving (15) and (16) at each voxel Vi, we obtain the additive 


19 





















genetic factor or HI given by 


Hlj — 2\p^Y2^(vi) — Pnzivi)]. 

HI is a univariate feature measured at each voxel and does not quantify if 
the change in one voxel is related to other voxels. We can extend the concept 
of HI to the network level by defining the heritability graph index (HGI): 

HGIjj = 2[Q^iT,{vi,Vj) - QT,z{vi,Vj)], 

where £>mz and ^dz are the symmetrized cross-correlations between voxels 
Vi and Vj for MZ- and DZ-twin pairs. Note that QMzivi,Vi) = Puzivi) and 
QDz{vi,Vi) = Puz{vi) and the diagonal entries of HGI is HI, i.e., HGIjj = HIj. 
HGI measures genetic contribution in terms of percentage at the network 
level while generalizing the concept of HI. In Figure 6-top and -middle, 
the nodes are correlations Puzivi) and Puz{vi) while the edges are cross¬ 
correlations QMzivi,Vj) and QY>z{vi-,Vj) respectively. In Figure 6-bottom, the 
nodes are HI while edges are off-diagonal entries of HGI. 

To obtain the statistical significance of the HGI, we computed statistic 
Dp and used Proposition 4. For our data, the observed do is 2.40 for the 
number of clusters and 23.12 for the size of largest cluster, p-values are less 
than 0.00002 and 0.00001 respectively indicating very strong signihcance of 
HGI. 

We further checked the validity of the proposed method by randomly 
permuting twin pairs such that we pair two images if they are not from the 
same twin. These pairs are split into half and the sparse cross-correlations 
and HGI are computed following the proposed pipeline. It is expected that 
we should not detect any signal beyond random chances. Figure 7 shows an 
example of sparse cross-correlation for one set of randomly paired images. As 
expected, even at sparse parameter A = 0.5, there are not many significant 
connections. At sparse parameter A = 0.7, there is no significant connections 
at all. In comparison, we are detecting extremely dense connections for both 


20 



Figure 6: Top and Middle: Node colors are correlations of MZ- and DZ- 
twins. Edge colors are sparse cross-correlations at given sparsity A. Bottom: 
node colors are heritability indices (HI) and edge colors are heritability graph 
index (HGI). MZ-twins show higher correlations at both voxel and network 
levels compared to DZ-twins. Some low HI nodes show high HGI. Using 
only the voxel-level HI feature will fail to detect such subtle genetic effects 
on the functional network. 


MZ- and DZ-twins clearly demonstrating the high cross-correlations are due 
to the group labeling and not due to random chances. 


4 Conclusion 

In this paper, we explored the computational issues of constructing large- 
scale brain networks within a unified mathematical framework integrating 
sparse network model building, parameter estimation and statistical infer- 
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Figure 7; Top and Middle: Node colors are correlations of MZ- and DZ- 
twins. Edge colors are sparse cross-correlations at given sparsity A. Bottom: 
node colors are heritability indices (HI) and edge colors are heritability graph 
index (HGI). MZ-twins show higher correlations at both voxel and network 
levels compared to DZ-twins. Some low HI nodes show high HGI. Using 
only the voxel-level HI feature will fail to detect such subtle genetic effects 
on the functional network. 

ence on graph filtrations. Although the method is applied in twin fMRI data 
to address the problem of determining the genetic contribution to functional 
brain networks, it can be applied to other more general problems of correlat¬ 
ing paired data such as multimodal fMRI to PET. We believe the theoretical 
framework presented here provides a motivation for future works on various 
fields dealing with paired functional data and networks. 
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Appendix 

Proof to Proposition 1. Write F(/3;x, y,A) as 


F(/3;x,y,A) = \Y1 
*j=i 

where 

fiPij) =11 yivj) - Pij ^{vi) f +2A|^ij|. 

Since f{Pij) is nonnegative and convex, F{/3;x,y, X) is minimum if each 
component f{/3ij) achieves its minimum. So we only need to minimize each 
component f{(3ij) separately. Using the constraint ||x(ui)|| = 1, fi'jjk) is 
simplified as 


1 - 2/3ijx{viyy{vj) + Pfj + 2\\(3ij\ 

= iPij - ^(viYyivj)? + 2A|/3ij| + 1 - [xiviYyivj)]"^. 


For A = 0, the minimum of f{(3ij) is achieved when /3ij = x(ui)'x(uj), 
which is the usual least squares estimation (LSE). For A > 0, Since fiPij) 
is quadratic, the minimum is achieved when 
d f 

^ = 2l3ij-2x\vi)y{vj)±2X = 0. (17) 

Pi j 

The sign of A depends on the sign of ftij. By rearranging terms, we obtain 
the desired result. □ 


Proof to Proposition 2. (1) The adjacency matrix A = (a^) of ^°(A) is 
given by aij = 1 if fiij{X) / 0 and Oij = 0 otherwise. From Proposition 1, 
3ij(A) / 0 if |x(uj)'y(uj)| > A and '^ij{X) = 0 if |x(ui)'y('(^i)| - Thus, 
adjacency matrix A is equivalent to another adjacency matrix B = (bij) 
given by 


bij{X) = 



if \pij\ > A; 
otherwise 


(18) 
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On the other hand, the adjacency matrix of binary graph is exactly 
given by B. Thus ^°(A) = 

(2) For all 0 < Ai < A 2 < • • •, ftij(Ai) > bij{X 2 ) > • • • for all i,j. Hence, 

^+Ai ^ ^+A2 3 ... 

and equivalently, ^^(Ai) D G^{X 2 ) !)•••• So the collection of ^*^(A) forms a 
graph filtration. 

(3) If 0 = p(o) < A < p(i), gO(A) = n^Pw, If < A < gO(A) = 

BX^Pii) ioT 1 < i < q — For < A, ^°(A) = , the node set. Thus 

^‘^(A) = for any A > 0 for some i. □ 

Proof to Proposition 3. For a tree with p nodes, there are p — 1 edges with 
edge weights sorted as 

P(i) ~ — P{2) ^ ^ P(p-2) ^ P{p-i) — niaxpij. 

No edge weights are above P{p-i), thus when thresholded at P(p-i), ah the 
edges are removed and end up with node set V. Thus, = p. 

Since all edges are above 0, = 1. 

For any graph G with p nodes and any A, 1 < < p. Thus, 

the ranges of ifG~^^ and match. There exists some p(j) satisfying 

ifG^^ = for some i. Note + 1. Therefore, 

#G'+^ < #M+^(*+i). □ 


Proof to Proposition f. Let Mi be the MST of graph Gi. From the proof 
of Proposition 3, the range of function exactly matches the range of 

function Bi{X). Thus, we have 


sup |Hi(A) - H 2 (A)| = sup 
A i 


Therefore, the problem is equivalent to testing the equivalence of two set of 
paired monotonic vectors and , • • • , 
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Combine these two vectors and arrange them in increasing order. Represent 
and as X and y respectively. Then the combined vectors 

can be represented as, for example, xxyxyx • • •. Treat the sequence as walks 
on a Cartesian grid, x indicates one step to the right and y indicates one 
step up. Then from grid (0,0) to {p — l,p — 1), there are total {^pZi) pos¬ 
sible number of paths. Then following closely the argument given in pages 
192-194 in Gibbons and Chakraborti (2011) and Bohm and Hornik (2010) 
we have 


P 


Dp 

p — 1 



(?-T 


(19) 


where -|- Ap_ 2 ,p-i with boundary conditions ^(0,p — 

1) = A[p— 1,0) = 1. Then asymptotically (19) can be written as (Smirnov, 
1939) 


lim 

p^oo 


Dp 

p — 1 



= 1 -2^(-l)*-ie-2*"‘^'. □ 

i=l 
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